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We claim: 

1 1 . A system for estimating prevalence of digital content on the World- Wide- Web, comprising: 

2 an estimating device for estimating the global traffic to a plurahty of Web sites to provide 

3 traffic data; 

4 a sampling device for statistically sampling the contents of said plurality of Web sites to 

5 provide sampling data; 

6 a storage device for storing said traffic data and said sampling data; and 

7 an accessing device for accessing said traffic data and said sampling data stored in said 
® storage device. 

^ 2. The system of claim 1, wherein said estimating device being a globally distributed set of 
proxy cache servers. 

fh 3. The system of claim 1, wherein said estimating device computes for each Web site, the 

CI number of impressions of an advertisement on a Web page on said each Web site. 

1 4. The system of claim 1 , wherein said sampling device includes: 

2 a prober for periodically fetching pages fi-om each Web site; 

3 an extractor for extracting fi-agments fi'om said pages; and 

4 a classifier for classifying said fi"agments. 

1 5. The system of claim 1, wherein said accessing device generates reports in accordance with a 

2 predetermined criteria. 
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1 6. A method of estimating prevalence of digital content on the World- Wide- Web, comprising 

2 the steps of: 

3 estimating the global traffic to a plurality of Web sites to provide traffic data; 

4 statistically sampling the contents of said plurality of Web sites to provide sampling data; 

5 storing said traffic data and said sampling data; 

6 accessing said traffic data and said sampling data stored in said storage device to generate 

7 reports. 

% 1' A system for estimating the prevalence of digital content on a network, wherein the network 

f2| connects to at least one network site having at least one network server to access at least one 

01 uniform resource locator, the system comprising: 
|4 a database; 

^1 a traffic analysis system that receives a traffic data sample fix>m a traffic sampling system 

l4 and stores the traffic data sample in the database, wherein the traffic sampling system is connected 

7 to the network, and wherein the traffic data sample includes said at least one uniform resource 

8 locator; 

9 an digital content sampling system connected to the network, wherein the digital content 

10 sampling system retrieves at least one digital content resource fi-om said at least one uniform 

1 1 resource locator and stores said at least one digital content resource in the database; and 

12 a statistical summarization system that creates summarization data that describes said at 

1 3 least one digital content resource and stores the summarization data in the database. 
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1 8. The system of claim 7, further comprising: 

2 a Web front end connected to the network, wherein a cHent can use the Web front end to 

3 - access the database, and wherein the cUent uses a browser to connect to the Web front end; and 

1 9. The system of claim 7, further comprising: 

2 a user interface that an account manager, operator, or media editor can use to administer the 

3 system. 

^ 10. The system of claim 7, wherein the network is the Internet, and wherein the network site is a 

W Web site. 

T 11. The system of claim 7, wherein the traffic analysis system further comprises: 

fl an anonymity system that receives the traffic data sample from the traffic sampling system 

: is? 

iM and produces a clean traffic data sample; and 

t§ a traffic summarization system that produces a summarization of the clean traffic data 

5 sample and stores the traffic data sample in the database. 

1 12. The system of claim 1 1 , wherein the anonymity system produces a clean traffic data sample 

2 by removing network address or cookie data from the traffic data sample. 

1 13. The system of claim 1 1 , wherein the summarization of the clean traffic data sample includes 

2 a reference to said at least one uniform resource locator and a tally of the number of times said at 

3 least one uniform resource locator was requested. 
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1 14. The system of claim 7, wherein the digital content sampling system further comprises: 

2 a probe mapping system that uses the summarization data to create a probe map for the 

3 network, wherein the probe map includes a mapping for said at least one uniform resource locator; 

4 a uniform resource locator retrieval system that retrieves said at least one uniform resource 

5 locator from the network server; 

6 a browser emulation environment that conducts a simulation of the display of said at least 

7 one uniform resource locator in a browser; 

M a digital content extractor that retrieves said at least one digital content resource from said at 

^ least one uniform resource locator and stores said at least one digital content resource in the 

W database; 

rl' a structural classifier that determines at least one classification type for said at least one 

digital content resource and stores said at least one classification type in the database; and 
1 1 a statistical summarization of the prevalence of the digital content. 

1 15. The system of claim 14, wherein the probe map comprises: 

2 a probability of the likelihood that said at least one uniform resource location will be 

3 sampled; and 

4 a scale that determines the contribution of said at least one uniform resource location to . 

1 16. The system of claim 14, wherein the simulation includes executing a program embedded in 

2 said at least one uniform resource locator. 
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1 17. The system of claim 16, wherein the program is a JavaScript script, Java applet, Perl script, 

2 or common gateway interface program. 

1 1 8. The system of claim 14, wherein the simulation includes executing dynamic digital content 

2 in said at least one uniform resource locator. 

1 19. The system of claim 18, wherein the dynamic content is an interlaced GIF image, MPEG 

2 movie, or MP3 audio file. 

% 20. The system of claim 14, wherein the digital content extractor retrieves said at least one 

digital content resource from said at least one uniform resource locator by applying a rule set 

fl defined by a media editor. 

f 1 21 . The system of claim 14, wherein the digital content extractor retrieves said at least one 

^1 digital content resource from said at least one uniform resource locator by using an automated 

3 digital content detection system. 

1 22. The system of claim 21 , wherein the automatic digital detection system comprises: 

2 a structural detector that locates particular XML structures; and 

3 a feature detector that locates particular XML features within said structures. 

1 23. The system of claim 14, wherein the structural classifier determines said at least one 

2 classification type for said at least one advertisement. 
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1 24. The system of claim 7, wherein the user interface comprises: 

2 a system accoimt management interface, wherein the account manager uses the system 

3 account management interface to create and modify an account for the cHent on the system; 

4 a site administration interface, wherein the operator uses the site administration interface; 

5 a taxonomy administration interface, wherein the media editor uses the taxonomy 

6 administration interface; 

7 an advertising content classification interface, wherein the media editor uses the advertising 
^ content classification interface; and 

W a rate card collection interface, wherein the media editor uses the rate card collection 

interface. 

h J; 
E, ■! 3 

h 25. A system for estimating prevalence of dynamic content on a network, comprising: 
f i a memory device; and 

13 a processor disposed in communication with said memory device, said processor configured 

4 to: 

5 collect a sample of traffic data to a plurality of Web sites; 

6 compute a number of impressions of a Web advertisement from each of a plurality 

7 of Web sites to generate traflBc data, 

8 retrieve sample contents of each of said Web sites to generate sampling data, and 

9 generate prevalence estimates of said dynamic content from said traffic data and said 
10 sampling data. 
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1 26. The system of claim 25 wherein said processor is further configured to sample said contents 

2 by retrieving Web pages from each of said Web sites, extract fragments from said Web 

3 pages and classify said fragments. 

1 27. The system of claim 25 wherein said processor is further configured to generate said traffic 

2 data by retrieving anonymous traffic data samples. 

1 28. The system of claim 27 wherein said processor is configured to retrieve anonymous data 
^ samples by removing data from traffic data samples which identify users on said network. 

iV 29. The system of claim 25 wherein said processor is further configured to classify fragments 

2 within said sampling data. 

f I 30. The system of claim 29 wherein said processor is further configured to classify fragments by 

[| analyzing each fragment for uniqueness, and adding information to a database regarding the 

3 uniqueness of said fragment. 

1 31. The system of claim 30 wherein said processor is configured to classify said fragments by 

2 detecting duplicate fragments. 

1 32. The system of claim 25 wherein said processor is further configured to interact with a user 

2 interface for use in administering said system. 
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33. The system of claim 25 wherein said processor is further configured to generate said traffic 
data to include uniform resource locator information regarding said plurality of Web sites. 

34. The system of claim 25 wherein said processor is further configured to perform data 
integrity monitoring of said sample data. 

35. The system of claim 25 wherein said processor is configured to serve as an automatic 
advertisement detection system. 

36. The system of claim 35 wherein said processor is configured to serve as an automatic 
advertisement detection system by using heuristics to detect advertising within HTML or XML 
documents, and normalizing detected HTML or XML content into a hierarchical form. 

37. A method for using a computer to estimate prevalence of dynamic content on a network, 
comprising: 

computing a number of impressions of a Web advertisement from each of a plurality of Web 
sites to generate traffic data; 

retrieving sample contents of each of said Web sites, using said computer, to generate 
sampling data; and 

generating prevalence estimates of said dynamic content from said traffic data and said 
sampling data. 



17729_4 



Page 56 of 60 





Docket No.: 4127-4000 



Morgan & Finnegan, L.L.P. 

1 38. The method of claim 37 wherein said retrieving comprises retrieving Web pages from each 

2 of said Web sites, extracting fragments from said Web pages and classifying said fragments. 

1 39. The method of claim 37 wherein said traffic data is generated by retrieving anonymous 

2 traffic data samples. 

1 40. The method of claim 39 wherein said retrieving comprises retrieving anonymous data 

2 samples by removing data from traffic data samples which identify users on said network. 



41 . The method of claim 37 fiirther comprising classifying fragments within said sampling data. 



f 2 fragment for uniqueness, and adding information to a database regarding the uniqueness of each 

?! ; 

[3 said fi-agment. 

1 43. The method of claim 42 fiirther comprising classifying said fragments by detecting duplicate 

2 fragments. 

1 44. The method of claim 37 fiirther comprising interacting with a user interface to administer 

2 said system. 

1 45. The method of claim 37 fiirther comprising generating said traffic data to include uniform 

2 resource locator information regarding said plurality of Web sites. 



42. The method of claim 41 wherein said classifying fragments comprises analyzing each 
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1 46. The method of claim 37 further comprising performing data integrity monitoring of said 

2 sample data. 

1 47. The method of claim 37 further comprising performing automatic advertisement detection 

2 by using heuristics to detect advertising within HTML or XML documents, and normalizing 

3 detected HTML or XML content into a hierarchical form. 



f B 48. A computer readable medium comprising: 

Ci code for computing a number of impressions of a Web advertisement from each of a 

plurality of Web sites to generate traffic data; 
jl code for retrieving sample contents of each of said Web sites to generate sampling data; and 

code for generating prevalence estimates of dynamic content from said traffic data and said 

f I sampling data. 

1 49. The computer readable medium of claim 48 further comprising code to extract fragments 

2 from said Web pages and classify said fragments. 

1 50. A system for estimating prevalence of dynamic content on a network, comprising: 

2 means for computing a number of impressions of a Web advertisement from each of a 

3 plurality of Web sites to generate traffic data; 

4 means for retrieving sample contents of each of said Web sites, using said computer, to 

5 generate sampling data; and 
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6 means for generating prevalence estimates of said dynamic content from said traffic data 

7 and said sampling data. 

1 51. The system of claim 50 further comprising: 

2 means for classifying fragments extracted from said Web pages. 

1 52. The system of claim 50 further comprising: 

2 means for anonymizing said traffic data. 

% 53. A system of estimating prevalence of dynamic content on the World- Wide- Web, 

fe| comprising: 

C3^ means for estimating global traffic to a plurality of Web sites to provide traffic data; 

means for statistically sampling the contents of said plurality of Web sites to provide 
sampling data; 

l€ means for storing said traffic data and said sampling data; and 

7 means for accessing said traffic data and said sampling data stored in said storage device to 

8 generate prevalance estimates and reports therefrom. 
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